DiscoverLessWrong (Curated & Popular)[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman
[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

Update: 2025-03-19
Share

Description

This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.

Full paper | Github repo

---

First published:
March 19th, 2025

Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks

Linkpost URL:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

---

Narrated by TYPE III AUDIO.

---

Images from the article:

undefinedApple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman